Take Home Exercise 2

Author

Georgia Ng

Published

September 26, 2024

Modified

October 10, 2024

1. Overview

1.1 Introduction

Drug abuse is associated with significant negative health, financial and social consequences. Yet, illicit drug consumption remains highly prevalent and continues to be a growing problem worldwide. In 2021, 1 in 17 people aged 15–64 in the world had used a drug in the past 12 months. Notwithstanding population growth, the estimated number of drug users grew from 240 million in 2011 to 296 million in 2021.

The geopolitics of Thailand which is near the Golden Triangle of Indochina, the largest drug production site in Asia, and the constant transportation infrastructure development made Thailand became market and transit routes for drug trafficking to the third countries.

In Thailand, drug abuse is one of the major social issue. There are about 2.7 million youths using drugs in Thailand. Among youths aged between 15 and 19 years, there are about 300,000 who have needs for drug treatment. Most of Thai youths involved with drugs are vocational-school students, which nearly doubles in number compared to secondary-school students.

In particular, we will only be focusing on drug use cases since

1.2 Goal

In this exercise, we are going to achieve the following tasks:

  • Using appropriate function of sf and tidyverse, preparing the following geospatial data layer:

    • a study area layer in sf polygon features. It must be at province level (including Bangkok) of Thailand.

    • a drug abuse indicators layer within the study area in sf polygon features.

  • Using the extracted data, perform global spatial autocorrelation analysis by using sfdep methods.

  • Using the extracted data, perform local spatial autocorrelation analysis by using sfdep methods.

  • Describe the spatial patterns revealed by the analysis above.

1.3 Importing of Packages

Before we start off, we will have to import the necessary packages required for us to conduct our analysis.

We will be using the following packages:

  • sf: Manages spatial vector data, enabling operations like spatial joins, buffering, and transformations for points, lines, and polygons.

  • tmap: Creates and customizes thematic maps for spatial data visualization, including static and interactive maps with various map elements.

  • tidyverse: A suite of packages for data manipulation (dplyr), visualization (ggplot2), and tidying (tidyr), facilitating a streamlined workflow for data analysis.

  • ggplot2: A powerful data visualization package within the tidyverse, allowing the creation of complex and customizable graphs. It supports a wide range of geospatial plotting when combined with sf objects, enabling seamless integration of spatial data in layered plots like points, polygons, and lines.

  • sfdep: Facilitates spatial dependence analysis by providing tools to compute spatial weights, autocorrelation statistics, and other spatial econometric measures. It integrates with the sf package, allowing users to easily conduct spatial analysis on spatial data frames.

pacman::p_load(sf, tmap, tidyverse, sfdep, ggplot2,RColorBrewer)

1.4 Dataset

For the purpose of this take-home exercise, two data sets shall be used, they are:

  • Thailand Drug Offenses [2017-2022] at Kaggle.

  • Thailand - Subnational Administrative Boundaries at HDX. You are required to use the province boundary data set. Thailand administrative level 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries

2. Data Wrangling

2.1 Importing Thailand Drug Offenses

This line of code imports the Thailand Drug Offenses dataset from Kaggle while using the select() function to remove the province_th column which is not useful to us. For this dataset, we have a total of 7392 rows.

drug_offenses_sf <- read_csv("data/aspatial/drug_offence/thai_drug_offenses_2017_2022.csv") %>% select(-province_th)
summary(drug_offenses_sf)
  fiscal_year   types_of_drug_offenses    no_cases       province_en       
 Min.   :2017   Length:7392            Min.   :    0.0   Length:7392       
 1st Qu.:2018   Class :character       1st Qu.:    1.0   Class :character  
 Median :2020   Mode  :character       Median :   70.0   Mode  :character  
 Mean   :2020                          Mean   :  535.3                     
 3rd Qu.:2021                          3rd Qu.:  623.0                     
 Max.   :2022                          Max.   :17131.0                     

2.2 Importing Thailand - Subnational Administrative Boundaries

This line of code imports the Thailand - Subnational Administrative Boundaries dataset using st_read(). As shown, there is a total of 77 geographic features, representing the 76 provinces in Thailand as well as one additional special administrative area (Bangkok).

thailand_province_sf <- st_read(dsn="data/geospatial/tha_adm_rtsd_itos_20210121_shp/", 
                   layer="tha_admbnda_adm1_rtsd_20220121") %>%
  select(ADM1_EN, geometry)  
Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source 
  `/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex02/data/geospatial/tha_adm_rtsd_itos_20210121_shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84

This line of code checks if all the polygons are valid.

st_is_valid(thailand_province_sf)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE

This line of code displays the study area.

tm_shape(thailand_province_sf) +
  tm_polygons() 

2.3 Data Cleansing & Extraction

2.3.1 Extracting Data

To see the different types of drug offenses currently in the dataset, we can use unique(). From this we can see that there a total of 16 types of different offenses.

unique(drug_offenses_sf$types_of_drug_offenses)
 [1] "drug_use_cases"                                        
 [2] "suspects_in_drug_use_cases"                            
 [3] "possession_cases"                                      
 [4] "suspects_in_possession_cases"                          
 [5] "possession_with_intent_to_distribute_cases"            
 [6] "suspects_in_possession_with_intent_to_distribute_cases"
 [7] "trafficking_cases"                                     
 [8] "suspects_in_trafficking_cases"                         
 [9] "production_cases"                                      
[10] "suspects_in_production_cases"                          
[11] "import_cases"                                          
[12] "suspects_in_import_cases"                              
[13] "export_cases"                                          
[14] "suspects_in_export_cases"                              
[15] "conspiracy_cases"                                      
[16] "suspects_in_conspiracy_cases"                          

For this study, we will only be focusing on drug use cases which we will use filter() to do so, limiting the scope of this project to only that.

drug_use_sf <-drug_offenses_sf%>%filter(types_of_drug_offenses=="drug_use_cases")
summary(drug_use_sf)
  fiscal_year   types_of_drug_offenses    no_cases       province_en       
 Min.   :2017   Length:462             Min.   :   32.0   Length:462        
 1st Qu.:2018   Class :character       1st Qu.:  798.2   Class :character  
 Median :2020   Mode  :character       Median : 1403.5   Mode  :character  
 Mean   :2020                          Mean   : 1981.7                     
 3rd Qu.:2021                          3rd Qu.: 2440.2                     
 Max.   :2022                          Max.   :16480.0                     

2.3.2 Performing relational join

2.3.2.1 Failed Attempt

To fix and standardize the data for consistent matching, we will use toupper() to ensure the columns ADM1_EN of thailand_province_sf and province_en of drug_use_sf can match correctly.

drug_use_sf$province_en <- toupper(drug_use_sf$province_en)
thailand_province_sf$ADM1_EN <- toupper(thailand_province_sf$ADM1_EN)

And to ensure that the changes are done correctly, we can use head() for this. As seen here, the province names are all now in upper cases.

head(drug_use_sf)
# A tibble: 6 × 4
  fiscal_year types_of_drug_offenses no_cases province_en             
        <dbl> <chr>                     <dbl> <chr>                   
1        2017 drug_use_cases            11871 BANGKOK                 
2        2017 drug_use_cases              200 CHAI NAT                
3        2017 drug_use_cases              553 NONTHABURI              
4        2017 drug_use_cases              450 PATHUM THANI            
5        2017 drug_use_cases              378 PHRA NAKHON SI AYUTTHAYA
6        2017 drug_use_cases              727 LOBURI                  
head(thailand_province_sf)
Simple feature collection with 6 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 100.1913 ymin: 13.47842 xmax: 100.9639 ymax: 14.80246
Geodetic CRS:  WGS 84
                   ADM1_EN                       geometry
1                  BANGKOK MULTIPOLYGON (((100.6139 13...
2             SAMUT PRAKAN MULTIPOLYGON (((100.7306 13...
3               NONTHABURI MULTIPOLYGON (((100.3415 14...
4             PATHUM THANI MULTIPOLYGON (((100.8916 14...
5 PHRA NAKHON SI AYUTTHAYA MULTIPOLYGON (((100.5131 14...
6                ANG THONG MULTIPOLYGON (((100.3332 14...

The code chunk below will be used to update the attribute table of drug_use_sf and thailand_province_sf dataframe. This is performed by using left_join() of dplyr package.

combined_drug_use_sf <- left_join(thailand_province_sf, drug_use_sf,
                          by =  c("ADM1_EN" = "province_en"))

After performing the left_join() between thailand_province_sf and drug_use_sf, we can usesummary() to examine the resulting dataset. The output indicated that the combined dataset contains only 452 rows as compared to the original 462 rows for drug_use_sf.

This suggests that there were rows in drug_use_sf that did not successfully join with thailand_province_sf.

summary(combined_drug_use_sf)
   ADM1_EN           fiscal_year   types_of_drug_offenses    no_cases      
 Length:452         Min.   :2017   Length:452             Min.   :   32.0  
 Class :character   1st Qu.:2018   Class :character       1st Qu.:  788.8  
 Mode  :character   Median :2020   Mode  :character       Median : 1399.5  
                    Mean   :2020                          Mean   : 1994.2  
                    3rd Qu.:2021                          3rd Qu.: 2444.5  
                    Max.   :2022                          Max.   :16480.0  
                    NA's   :2                             NA's   :2        
          geometry  
 MULTIPOLYGON :452  
 epsg:4326    :  0  
 +proj=long...:  0  
                    
                    
                    
                    

2.3.2.2 Fixing the Mismatch in Province Names

The most likely reasons for this could be a mismatch in certain Province Names, hence to effectively identified the province names that do not match between the two datasets, we can utilize anti_join() to do so, finding the rows in each that did not manage to join in each dataset. This provides a straightforward comparison of the discrepancies between the two datasets.

# Get rows in drug_use_sf that didn't match with thailand_province_sf
unmatched_in_drug_use <- anti_join(drug_use_sf, thailand_province_sf, 
                                        by = c("province_en" = "ADM1_EN"))

# Get rows in thailand_province_sf that didn't match with drug_offenses_sf
unmatched_in_thailand_province <- anti_join(thailand_province_sf, drug_use_sf, 
                                            by = c("ADM1_EN" = "province_en"))

# Display mismatched values
unmatched_province_en <- unique(unmatched_in_drug_use$province_en)
unmatched_adm1_en <- unique(unmatched_in_thailand_province$ADM1_EN)

# View the mismatched sets
list("Unmatched in drug_use_sf" = unmatched_province_en,
     "Unmatched in thailand_province_sf" = unmatched_adm1_en)
$`Unmatched in drug_use_sf`
[1] "LOBURI"  "BUOGKAN"

$`Unmatched in thailand_province_sf`
[1] "LOP BURI"  "BUENG KAN"

As shown, there is a mismatch in the naming of the two provinces in each dataset although they are referring to the same provinces, hence we will need to standardise that. Since “Lopburi” and “Bueng Kan” were the names used in wikipedia, we will just be sticking with that.

This code performs standardization of province names in two datasets: drug_offenses_sf and thailand_province_sf, using recode() to change specific values.

# Update the values in the province_en column
drug_use_sf <- drug_use_sf %>%
  mutate(province_en = recode(province_en,
                               "LOBURI" = "LOPBURI",
                               "BUOGKAN" = "BUENG KAN"))

# Update the values in the ADM1_EN column
thailand_province_sf <- thailand_province_sf %>%
  mutate(ADM1_EN = recode(ADM1_EN,
                          "LOP BURI" = "LOPBURI",
                          "BUENG KAN" = "BUENG KAN"))

2.3.2.3 Reattempt At Performing Relational Join

Finally, let us try to perform the relational join again. As seen, we have a total of 462 rows which aligns with the total number of rows that is in drug_offenses_sf and for us to easier refer to it, we will rename the column to province.

combined_drug_use_sf <- left_join(thailand_province_sf, drug_use_sf,
                          by =  c("ADM1_EN" = "province_en")) %>%
  rename(province = ADM1_EN)

summary(combined_drug_use_sf)
   province          fiscal_year   types_of_drug_offenses    no_cases      
 Length:462         Min.   :2017   Length:462             Min.   :   32.0  
 Class :character   1st Qu.:2018   Class :character       1st Qu.:  798.2  
 Mode  :character   Median :2020   Mode  :character       Median : 1403.5  
                    Mean   :2020                          Mean   : 1981.7  
                    3rd Qu.:2021                          3rd Qu.: 2440.2  
                    Max.   :2022                          Max.   :16480.0  
          geometry  
 MULTIPOLYGON :462  
 epsg:4326    :  0  
 +proj=long...:  0  
                    
                    
                    

2.3 Drug Use Cases Visualisation

# Step 1: Aggregate data by year, summing the number of cases
drug_use_by_year <- combined_drug_use_sf %>% select(fiscal_year, no_cases) %>%
  group_by(fiscal_year) %>%
  summarise(total_cases = sum(no_cases), .groups = 'drop')

write_rds(drug_use_by_year, "data/rds/drug_use_by_year.rds")
# Create the bar plot with exact labels
ggplot(drug_use_by_year, aes(x = factor(fiscal_year), y = total_cases)) +  # Convert fiscal_year to factor
  geom_col(fill = "red", color = "black") +  # Use geom_col for the bar plot
  geom_text(aes(label = total_cases), vjust = -0.5, size = 5) +  # Add text labels on top of bars
  labs(title = "Total Drug Use Cases by Year", x = "Year", y = "Total Cases") +
  theme_minimal() +
  scale_x_discrete(drop = FALSE) + # Ensures all years are shown on the x-axis
  theme(plot.title = element_text(hjust = 0.5))

Note

We can see that 2021 has the most drug use cases recorded out of all these years.

2.4 Geographic Distribution of Drug Use Cases by Province and by Year

Using the tmap methods below, we can create chloropeth maps to visualise the geographic distribution of the drug use cases by province and by year.

# Set tmap mode to view or plot
tmap_mode("plot")

tm_shape(combined_drug_use_sf) +
  tm_polygons("no_cases", title = "Drug Use Cases", style = "quantile", palette = "Reds", n=5) +
  tm_facets(by = "fiscal_year", nrow = 2, ncol = 3) +  # Adjust rows and columns as needed
    tm_layout(
    main.title = "Geographic Distribution of Drug Use Cases by Province and by Year",  # Main title for the entire plot
    legend.position = c("left", "top"),
    legend.text.size = 1.2,
    main.title.size = 1.5,  
    main.title.position = "center"
  )

3. Global Spatial Autocorrelation Analysis

# Filter for each year from 2017 to 2022
drug_use_2017_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2017)
drug_use_2018_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2018)
drug_use_2019_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2019)
drug_use_2020_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2020)
drug_use_2021_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2021)
drug_use_2022_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2022)

3.1 Computing Contiguity Spatial Weights (QUEEN)

Before calculating global spatial autocorrelation statistics, we first need to construct spatial weights for the study area. These weights define the neighborhood relationships between the geographical units (such as counties) within the study region. To do so, we will use st_conguity() of the sfdep package, the equivalent of poly2nb() function from the spdep package. This function creates a list of neighboring regions based on shared boundaries and is needed for when we conduct the subsequent tests.

In particular, we are using the Queen contiguity criterion, which defines neighbors as regions that share either a boundary or a vertex. This method is often preferred for creating spatial weights because it includes more neighboring units than the Rook criterion (which only considers shared boundaries).

The code chunk below is used to compute Queen contiguity weight matrix.

nb_list <- st_contiguity(thailand_province_sf, queen = TRUE)

write_rds(nb_list, "data/rds/nb_list.rds")
summary(nb_list)
Neighbour list object:
Number of regions: 77 
Number of nonzero links: 352 
Percentage nonzero weights: 5.93692 
Average number of links: 4.571429 
1 region with no links:
67
2 disjoint connected subgraphs
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 
 1  1  5 17 15 17 10  5  4  2 
1 least connected region:
14 with 1 link
2 most connected regions:
29 51 with 9 links
Note

The most connected area has 9 neighbours. There are two area units with only one neighbours.

  • The summary report above shows that there are 77 area units in Thailand.

  • The least connected province is Trat, with only 1 neighbour.

  • The most connected provinces on the other hand, are Khon Kaen and Tak, with 9 neighbours.

  • A province with no neighbour is Phuket.

For reference on where these provinces of interest are.

Click to view the code
# Define the regions and corresponding colors
highlight_info <- c(
  "TRAT" = "yellow",     
  "KHON KAEN" = "green",
  "TAK" = "blue",
  "PHUKET" = "red"
)

# Set tmap mode to plot
tmap_mode("plot")  # Use "plot" for static maps

# Create the base map with all provinces
base_map <- tm_shape(thailand_province_sf) +
  tm_polygons(col = "lightgray", title = "Province Highlight")

# Initialize combined map with the base map
combined_map <- base_map

# Loop through the highlight_info to add each region
for (region in names(highlight_info)) {
  combined_map <- combined_map + 
    tm_shape(thailand_province_sf[thailand_province_sf$ADM1_EN == region, ]) +
    tm_polygons(col = highlight_info[region], 
                 alpha = 0.7)  # No title needed here
}

# Manually add a legend using tm_add_legend
legend_elements <- lapply(names(highlight_info), function(region) {
  list(label = region, col = highlight_info[region])
})

# Add layout options and custom legend
combined_map +
  tm_add_legend(type = "fill", 
                labels = names(highlight_info), 
                col = highlight_info, 
                title = "Highlighted Provinces") +
  tm_layout(
    title = "Highlighted Provinces in Thailand", 
    legend.position = c("right", "bottom"),
    title.position = c("center", "top"),
    legend.title.size = 1.2,  # Adjust legend title size
    legend.text.size = 1      # Adjust legend text size
  )

To further investigate why Phuket has no neighboring provinces, we can examine the zoomed-in map of the region. The map clearly illustrates that Phuket is an island, situated away from the mainland. This geographical separation accounts for its lack of direct neighboring provinces.

Click to view the code
# Filter the dataset for Phuket province
phuket_sf <- thailand_province_sf %>% 
  filter(ADM1_EN == "PHUKET")

# Get bounding box for Phuket and slightly expand it to include surrounding provinces
bbox_phuket <- st_bbox(phuket_sf)

# Expand the bounding box by a certain factor (e.g., 0.1 degrees in each direction)
bbox_phuket_expanded <- bbox_phuket
bbox_phuket_expanded[1] <- bbox_phuket[1] - 0.1  # xmin
bbox_phuket_expanded[3] <- bbox_phuket[3] + 0.1  # xmax
bbox_phuket_expanded[2] <- bbox_phuket[2] + 0.1  # ymin (correcting direction)
bbox_phuket_expanded[4] <- bbox_phuket[4] + 0.1  # ymax

# Create a temporary dataset for plotting
phuket_plot_data <- thailand_province_sf %>%
  mutate(is_phuket = ifelse(ADM1_EN == "PHUKET", "Phuket", "Other Provinces"))

# Set the color palette explicitly for Phuket and other provinces
color_palette <- c("Phuket" = "red", "Other Provinces" = "gray85")

# Plot using tmap and focus on the Phuket region and its surroundings
tmap_mode("plot")  # Set tmap to static plotting mode

tm_shape(phuket_plot_data, bbox = bbox_phuket_expanded) +
  tm_borders() +        # Add borders of your spatial data
  tm_fill(col = "is_phuket", palette = color_palette) +  # Fill Phuket with red, others gray  
  tm_layout(title = "Phuket and Surrounding Provinces", frame = TRUE)  # Zoom in to Phuket region

3.2 Generating Spatial Weights

The chunk of code below utilizes the st_weights() function from the sfdep package to generate spatial weights for each neighboring polygon. Specifically, it constructs a weights matrix based on the contiguity defined by the nb_list object, allowing for the specification of weights style (here set to “W” for row-standardized weights). The allow_zero parameter is set to TRUE, which permits the inclusion of polygons with no neighbors in the analysis, ensuring that all spatial units are represented in the weights matrix.

# nb_list is our neighbors list created using st_contiguity
gm_wt <- st_weights(nb_list, style = "W", allow_zero = TRUE)

3.3 Moran’s I Test

We will be using Moran’s I to compute global spatial autocorrelation instead of Geary’s C because Moran’s I provides a more comprehensive measure of spatial autocorrelation. While Geary’s C focuses on the squared differences between neighboring observations, which can make it sensitive to local variations and outliers, Moran’s I considers the overall correlation between values and their spatial locations. This allows us to capture both positive and negative spatial autocorrelation more effectively.

Furthermore, Moran’s I is well-suited for assessing patterns across larger regions, making it ideal for our analysis of spatial relationships of drug use across provinces in Thailand. The confidence interval for this experiment will be set at 95%.

Hypothesis Testing:

Null Hypothesis: The spatial distribution of drug use cases resemble random distribution.

Alternative Hypothesis: The spatial distribution of drug use cases does not resemble random distribution.

The below line of code uses global_moran_test() of sfdep package to performs Moran’s I statistical testing and global_moran_perm() of sfdep is used to perform permutation testing for Moran’s I statistic. A total of 1000 simulation will be performed. We will also be using seed() to ensure the reproducibility of our code.

global_moran_test(drug_use_2017_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 1.5049, p-value = 0.06618
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.077263424      -0.013333333       0.003624223 
set.seed(1234)
gmc_i_2017<-global_moran_perm(drug_use_2017_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2017

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.077263, observed rank = 922, p-value = 0.156
alternative hypothesis: two.sided

The statistical report for 2017 above show that the p-value is greater than alpha value of 0.05. Hence, we do not have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering.

global_moran_test(drug_use_2018_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 1.673, p-value = 0.04716
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.091283835      -0.013333333       0.003910294 
set.seed(1234)
gmc_i_2018<-global_moran_perm(drug_use_2018_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2018

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.091284, observed rank = 944, p-value = 0.112
alternative hypothesis: two.sided

The statistical report for 2018 above show that the p-value is higher than alpha value of 0.05. Hence, we do not have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering.

global_moran_test(drug_use_2019_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 2.1385, p-value = 0.01624
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.138046280      -0.013333333       0.005010889 
set.seed(1234)
gmc_i_2019<-global_moran_perm(drug_use_2019_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2019

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.13805, observed rank = 972, p-value = 0.056
alternative hypothesis: two.sided

The statistical report for 2019 above show that the p-value is higher than alpha value of 0.05. Hence, we do not have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering. This number (0.13805) is however greater than the previous years, a sign that stronger clustering occurred in this year as compared to the previous.

global_moran_test(drug_use_2020_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 1.382, p-value = 0.08349
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.087091664      -0.013333333       0.005280586 
set.seed(1234)
gmc_i_2020<-global_moran_perm(drug_use_2020_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2020

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.087092, observed rank = 924, p-value = 0.152
alternative hypothesis: two.sided

The statistical report for 2020 above show that the p-value is higher than alpha value of 0.05. Hence, we do not have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering.

global_moran_test(drug_use_2021_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 2.862, p-value = 0.002105
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.20376109       -0.01333333        0.00575372 
set.seed(1234)
gmc_i_2021<-global_moran_perm(drug_use_2021_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2021

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.20376, observed rank = 993, p-value = 0.014
alternative hypothesis: two.sided

The statistical report for 2021 above show that the p-value is lower than alpha value of 0.05. Hence, we have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering. Overall, this shows a weak but significant positive spatial autocorrelation in drug use cases.

global_moran_test(drug_use_2022_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 2.7688, p-value = 0.002813
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.197931897      -0.013333333       0.005822019 
set.seed(1234)
gmc_i_2022<-global_moran_perm(drug_use_2022_sf$no_cases,nb_list,gm_wt, zero.policy = TRUE, nsim=999)
gmc_i_2022

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.19793, observed rank = 994, p-value = 0.012
alternative hypothesis: two.sided

The statistical report for 2022 above show that the p-value is lower than alpha value of 0.05. Hence, we have enough statistical evidence to reject the null hypothesis that the spatial distribution of drug use cases resemble random distribution (i.e. independent from spatial). Because the Moran’s I statistics is greater than 0 but not close to 1, we can infer that the spatial distribution shows weak signs of clustering. Overall, this shows a weak but significant positive spatial autocorrelation in drug use cases.

Note
  • Over the years, we can observe that the Moran’s I statistics is displaying an increasing trend all the way until 2020 when it decreased and then reaching an all-time high in 2021.

  • We can also observe that the drug use cases in 2021 and 2022 no longer resemble random distribution, indicating significant spatial autocorrelation. This change suggests that drug use is becoming increasingly concentrated in specific geographic areas, warranting further investigation into the factors contributing to this clustering.

  • The observed clustering, particularly in 2021 and 2022, suggests that policymakers should consider region-specific interventions. Identifying hotspots for drug use can help allocate resources more effectively and develop targeted prevention and treatment programs.

3.4 Visualising Monte Carlo Moran’s I

In order to examine the simulated Moran’s I test statistics in greater detail, we can collate the descriptive statistics of the test and plot the distribution of the statistical values as a histogram by using the code chunk below.

In the code chunk below summary(), var(), hist() and abline() are used.

Click to view the code
# Set up the plotting area for 2 columns and 3 rows
par(mfrow = c(3, 2)) 

# Histogram for 2017
hist(gmc_i_2017$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2017")
abline(v = 0, col = "red")

# Histogram for 2018
hist(gmc_i_2018$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2018")
abline(v = 0, col = "red")

# Histogram for 2019
hist(gmc_i_2019$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2019")
abline(v = 0, col = "red")

# Histogram for 2020
hist(gmc_i_2020$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2020")
abline(v = 0, col = "red")

# Histogram for 2021
hist(gmc_i_2021$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2021")
abline(v = 0, col = "red")

# Histogram for 2022
hist(gmc_i_2022$res, 
     freq = TRUE, 
     breaks = 20, 
     xlab = "Simulated Moran's I", 
     main = "Histogram for 2022")
abline(v = 0, col = "red")

# 2017
print(summary(gmc_i_2017$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.14280 -0.05100 -0.02108 -0.01248  0.02061  0.20105 
# 2018
print(summary(gmc_i_2018$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.14516 -0.05259 -0.01839 -0.01148  0.02319  0.26485 
# 2019
print(summary(gmc_i_2019$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.15737 -0.05925 -0.01691 -0.01110  0.02752  0.31845 
# 2020
print(summary(gmc_i_2020$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.15929 -0.06245 -0.02221 -0.01443  0.02685  0.33025 
# 2021
print(summary(gmc_i_2021$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.23519 -0.06342 -0.01758 -0.01355  0.03012  0.29476 
# 2022
print(summary(gmc_i_2022$res[1:999]))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.21825 -0.05910 -0.01462 -0.01072  0.03411  0.30446 
# Calculating variances and storing them in variables
gmc_i_var_2017 <- var(gmc_i_2017$res[1:999])
gmc_i_var_2018 <- var(gmc_i_2018$res[1:999])
gmc_i_var_2019 <- var(gmc_i_2019$res[1:999])
gmc_i_var_2020 <- var(gmc_i_2020$res[1:999])
gmc_i_var_2021 <- var(gmc_i_2021$res[1:999])
gmc_i_var_2022 <- var(gmc_i_2022$res[1:999])

# Printing the variances
cat(sprintf("Variance of '2017': %f\n", gmc_i_var_2017))
Variance of '2017': 0.003137
cat(sprintf("Variance of '2018': %f\n", gmc_i_var_2018))
Variance of '2018': 0.003527
cat(sprintf("Variance of '2019': %f\n", gmc_i_var_2019))
Variance of '2019': 0.004564
cat(sprintf("Variance of '2020': %f\n", gmc_i_var_2020))
Variance of '2020': 0.004721
cat(sprintf("Variance of '2021': %f\n", gmc_i_var_2021))
Variance of '2021': 0.005436
cat(sprintf("Variance of '2022': %f\n", gmc_i_var_2022))
Variance of '2022': 0.005386
Note
  • Right skewed histogram: Majority of the data have negative spatial autocorrelation but there are some regions with significantly higher positive spatial autocorrelation (high drug use cases in certain neighborhoods, low or moderate in majority).
  • Negative Median: Indicates negative spatial autocorrelation, majority of provinces have low or moderate drug use)
  • Low Variance (Range from 0.003 to 0.0055): Suggests consistency across simulations, indicating that the presence of spatial patterns are not influenced by local conditions

4. Local Spatial Autocorrelation Analysis

4.1 Visualising Local Moran’s I & p-value of Local Moran’s I

In the code chunk below, we are are using local_moran() of sfdep package to performs Moran’s I statistical testing and tmap functions to prepare a choropleth map by using value in the p_ii_sim field.

For consistency sake, we will be sticking to p_ii_sim for the entire experiment.

Click to view the code
set.seed(1234)
lisa_2017 <- drug_use_2017_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2017 <- tm_shape(lisa_2017) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2017",
            main.title.size = 0.8)

map2_2017 <- tm_shape(lisa_2017) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2017",
            main.title.size = 0.8)

tmap_arrange(map1_2017, map2_2017, ncol = 2)

Click to view the code
set.seed(1234)
lisa_2018 <- drug_use_2018_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2018 <- tm_shape(lisa_2018) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2018",
            main.title.size = 0.8)

map2_2018 <- tm_shape(lisa_2018) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2018",
            main.title.size = 0.8)

tmap_arrange(map1_2018, map2_2018, ncol = 2)

Click to view the code
set.seed(1234)
lisa_2019 <- drug_use_2019_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2019 <- tm_shape(lisa_2019) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2019",
            main.title.size = 0.8)

map2_2019 <- tm_shape(lisa_2019) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2019",
            main.title.size = 0.8)

tmap_arrange(map1_2019, map2_2019, ncol = 2)

Click to view the code
set.seed(1234)
lisa_2020 <- drug_use_2020_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2020 <- tm_shape(lisa_2020) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2020",
            main.title.size = 0.8)

map2_2020 <- tm_shape(lisa_2020) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2020",
            main.title.size = 0.8)

tmap_arrange(map1_2020, map2_2020, ncol = 2)

Click to view the code
set.seed(1234)
lisa_2021 <- drug_use_2021_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2021 <- tm_shape(lisa_2021) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2021",
            main.title.size = 0.8)

map2_2021 <- tm_shape(lisa_2021) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2021",
            main.title.size = 0.8)

tmap_arrange(map1_2021, map2_2021, ncol = 2)

Click to view the code
set.seed(1234)
lisa_2022 <- drug_use_2022_sf %>% 
  mutate(local_moran = local_moran(
    no_cases, nb_list, gm_wt, nsim = 999, zero.policy=TRUE),
         .before = 1) %>%
  unnest(local_moran)

tmap_mode("plot")
map1_2022 <- tm_shape(lisa_2022) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Drug Use Cases in 2022",
            main.title.size = 0.8)

map2_2022 <- tm_shape(lisa_2022) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "P-value of Local Moran's I in 2022",
            main.title.size = 0.8)

tmap_arrange(map1_2022, map2_2022, ncol = 2)

4.2 Skewness of Lisa

In order to determine whether to use mean or median for the subsequent visualisation, we will need to check if the data has a normal distribution of if its skewed.

summary(lisa_2017$skewness)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-4.5356 -2.4136 -1.8628 -0.7379  1.8934  3.3641       1 
summary(lisa_2018$skewness)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-4.2038 -2.1536 -1.7613 -0.7909  1.5514  3.0979       1 
summary(lisa_2019$skewness)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-2.7683 -1.4886 -1.1985 -0.5408  1.0407  2.1595       1 
summary(lisa_2020$skewness)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-2.5457 -1.3493 -1.1252 -0.5833  0.9249  2.1810       1 
summary(lisa_2021$skewness)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-1.4815 -0.7478 -0.5488 -0.2148  0.5724  1.1561       1 
summary(lisa_2022$skewness)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
-0.97047 -0.51021 -0.32683 -0.06994  0.48088  0.85687        1 

The median of the skewness statistics are in the negative range, from -0.32683 to -1.8628 which is close to 0. Hence we will be using mean to perform classification.

4.3 Visualising Lisa Map

LISA map is a categorical map showing outliers and clusters.

Two Types of Outliers:

High-Low:

  • Areas that have high values (e.g., a high incidence of a certain event) that are surrounded by areas with low values. This pattern may suggest that the high value is an anomaly compared to the neighboring areas.

Low-High:

  • Areas that exhibit low values surrounded by areas with high values. This situation may suggest that the low value is an anomaly or an area of concern, indicating that it is isolated from surrounding regions with higher levels of the phenomenon.

Two Types of Clusters:

High-High:

  • Areas where high values are surrounded by other high values. This indicates a cluster of similar high values, suggesting positive spatial autocorrelation. This pattern may reflect shared influences or conditions contributing to the elevated values in those regions. For instance, areas with high rates of drug use that are geographically close may indicate a pervasive issue or common underlying factors.

Low-Low:

  • Areas where low values are surrounded by other low values. This pattern indicates a cluster of similar low values, suggesting that these areas consistently experience lower levels of the phenomenon in question.

The below chunk of code generates the lisa map for each year based on the parameter mean.

Click to view the code
lisa_sig_2017 <- lisa_2017 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2017) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_2017) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) +  # Change the color palette here
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2017", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2017 %>%  select(province, mean))
Simple feature collection with 7 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.01629 ymin: 13.17847 xmax: 101.9901 ymax: 17.74368
Geodetic CRS:  WGS 84
# A tibble: 7 × 3
  province       mean                                                   geometry
  <chr>          <fct>                                        <MULTIPOLYGON [°]>
1 SAMUT PRAKAN   Low-High  (((100.7306 13.71713, 100.7307 13.71681, 100.7313 13…
2 NONTHABURI     Low-High  (((100.3415 14.10079, 100.3415 14.10001, 100.3415 14…
3 CHACHOENGSAO   High-High (((101.0612 13.97613, 101.0625 13.976, 101.0629 13.9…
4 NAKHON SAWAN   Low-Low   (((100.0266 16.189, 100.0267 16.18889, 100.0268 16.1…
5 KAMPHAENG PHET Low-Low   (((99.48875 16.91044, 99.48883 16.91016, 99.48884 16…
6 PHITSANULOK    Low-Low   (((101.0033 17.7334, 101.0037 17.73339, 101.0043 17.…
7 SAMUT SAKHON   Low-High  (((100.3091 13.7217, 100.3091 13.72169, 100.3096 13.…
Click to view the code
lisa_sig_2018 <- lisa_2018 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2018) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_2018) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) +  # Change the color palette here
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2018", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2018 %>%  select(province, mean))
Simple feature collection with 5 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 100.0267 ymin: 13.17847 xmax: 103.4129 ymax: 18.30525
Geodetic CRS:  WGS 84
# A tibble: 5 × 3
  province     mean                                                     geometry
  <chr>        <fct>                                          <MULTIPOLYGON [°]>
1 SAMUT PRAKAN High-High (((100.7306 13.71713, 100.7307 13.71681, 100.7313 13.7…
2 NONTHABURI   Low-High  (((100.3415 14.10079, 100.3415 14.10001, 100.3415 14.0…
3 CHACHOENGSAO High-High (((101.0612 13.97613, 101.0625 13.976, 101.0629 13.976…
4 NONG KHAI    High-Low  (((103.2985 18.29698, 103.2984 18.29697, 103.2982 18.2…
5 SAMUT SAKHON Low-High  (((100.3091 13.7217, 100.3091 13.72169, 100.3096 13.72…
Click to view the code
lisa_sig_2019 <- lisa_2019 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2019) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(lisa_sig_2019) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2019", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2019 %>%  select(province, mean))
Simple feature collection with 4 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 100.2624 ymin: 13.17847 xmax: 103.4129 ymax: 18.30525
Geodetic CRS:  WGS 84
# A tibble: 4 × 3
  province     mean                                                     geometry
  <chr>        <fct>                                          <MULTIPOLYGON [°]>
1 SAMUT PRAKAN High-High (((100.7306 13.71713, 100.7307 13.71681, 100.7313 13.7…
2 NONTHABURI   Low-High  (((100.3415 14.10079, 100.3415 14.10001, 100.3415 14.0…
3 CHACHOENGSAO High-High (((101.0612 13.97613, 101.0625 13.976, 101.0629 13.976…
4 NONG KHAI    Low-Low   (((103.2985 18.29698, 103.2984 18.29697, 103.2982 18.2…
Click to view the code
lisa_sig_2020 <- lisa_2020 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2020) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(lisa_sig_2020) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2020", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2020 %>%  select(province, mean))
Simple feature collection with 3 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.08612 ymin: 13.17847 xmax: 101.9901 ymax: 16.19126
Geodetic CRS:  WGS 84
# A tibble: 3 × 3
  province     mean                                                     geometry
  <chr>        <fct>                                          <MULTIPOLYGON [°]>
1 SAMUT PRAKAN Low-High  (((100.7306 13.71713, 100.7307 13.71681, 100.7313 13.7…
2 CHACHOENGSAO High-High (((101.0612 13.97613, 101.0625 13.976, 101.0629 13.976…
3 NAKHON SAWAN Low-Low   (((100.0266 16.189, 100.0267 16.18889, 100.0268 16.188…
Click to view the code
lisa_sig_2021 <- lisa_2021 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2021) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(lisa_sig_2021) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2021", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2021 %>%  select(province, mean))
Simple feature collection with 11 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 98.18107 ymin: 12.56321 xmax: 104.825 ymax: 16.91058
Geodetic CRS:  WGS 84
# A tibble: 11 × 3
   province        mean                                                 geometry
   <chr>           <fct>                                      <MULTIPOLYGON [°]>
 1 SING BURI       Low-Low   (((100.3691 15.0894, 100.3697 15.0891, 100.3708 15…
 2 CHAI NAT        Low-Low   (((100.1199 15.41243, 100.121 15.41234, 100.1229 1…
 3 YASOTHON        High-High (((104.3952 16.34843, 104.3983 16.34707, 104.4 16.…
 4 NAKHON SAWAN    Low-Low   (((100.0266 16.189, 100.0267 16.18889, 100.0268 16…
 5 UTHAI THANI     Low-Low   (((99.13905 15.79655, 99.13918 15.79652, 99.13965 …
 6 KAMPHAENG PHET  Low-Low   (((99.48875 16.91044, 99.48883 16.91016, 99.48884 …
 7 RATCHABURI      Low-Low   (((99.8821 13.94977, 99.88218 13.94976, 99.88248 1…
 8 KANCHANABURI    Low-Low   (((98.58631 15.65465, 98.58662 15.65384, 98.58697 …
 9 SUPHAN BURI     Low-Low   (((99.37118 15.05073, 99.37454 15.0495, 99.3762 15…
10 SAMUT SONGKHRAM Low-Low   (((100.0116 13.51359, 100.0117 13.51359, 100.0117 …
11 PHETCHABURI     Low-Low   (((99.75869 13.34249, 99.75876 13.34248, 99.75883 …
Click to view the code
lisa_sig_2022 <- lisa_2022 %>%
  filter(p_ii_sim < 0.05)

tmap_mode("plot")
tm_shape(lisa_2022) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(lisa_sig_2022) +
  tm_fill("mean", palette = brewer.pal(4, "Set1")) + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "Lisa Map for 2022", main.title.size = 1.2, main.title.position = "center", legend.text.size = 1.2)

Click to view the code
print(lisa_sig_2022 %>%  select(province, mean))
Simple feature collection with 8 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.01629 ymin: 7.090332 xmax: 104.4353 ymax: 18.30525
Geodetic CRS:  WGS 84
# A tibble: 8 × 3
  province         mean                                                 geometry
  <chr>            <fct>                                      <MULTIPOLYGON [°]>
1 NONG BUA LAM PHU Low-High  (((102.2866 17.69207, 102.2867 17.69206, 102.2867 …
2 NONG KHAI        High-High (((103.2985 18.29698, 103.2984 18.29697, 103.2982 …
3 MAHA SARAKHAM    High-High (((103.1562 16.6425, 103.1567 16.64236, 103.1568 1…
4 KALASIN          High-High (((103.584 17.09981, 103.5845 17.09973, 103.5846 1…
5 SAKON NAKHON     High-High (((103.5404 18.06785, 103.5405 18.0677, 103.5405 1…
6 NAKHON SAWAN     Low-Low   (((100.0266 16.189, 100.0267 16.18889, 100.0268 16…
7 KAMPHAENG PHET   Low-Low   (((99.48875 16.91044, 99.48883 16.91016, 99.48884 …
8 PHATTHALUNG      Low-High  (((99.96416 7.90199, 99.9642 7.901912, 99.96425 7.…
Note

High-High Clusters

  • From 2017-2020, we can observe Chachoengsao being in the “High-High” category, suggesting that this province consistently has a high number of drug cases surrounded by other provinces with similarly high numbers. This type of spatial autocorrelation indicates a cluster of high drug use incidents in that region.

  • Possible Reason: A likely reason for this observation is the geographic location of this province. Chachoengsao is strategically located near major transportation routes, including highways and railways, facilitating the movement of drugs. Its proximity to Bangkok and other major urban centers may have also contribute to increased drug trafficking and distribution.

  • We can also observe that in the same time period, there exists different provinces (e.g Nonthaburi, Samut Prakan) close by in the “Low-High” category west of Chachoengsao, suggesting that these provinces are outliers with low drug uses surrounded by provinces with high drug uses.

  • For 2021, we can observe that the “High-High” cluster has moved to the eastern region of Thailand to Yasothon.

  • In the subsequent year 2022 however, this “High-High” cluster has moved slightly north of Yasothon and now includes 4 provinces: Nong Khai, Maha Sarakham Kalasin and Sakon Nakhon.

  • This indicates a continued eastward movement of drug use, reflecting ongoing dynamics in drug trafficking and consumption.

5. Hot Spot and Cold Spot Area Analysis (HCSA)

5.1 Computing local Gi* statistics

As usual, we will need to derive a spatial weight matrix before we can compute local Gi* statistics.

The code chunk below is used to firstly derive a spatial weight matrix by using sfdep functions and tidyverse approach and secondly, compute the local Gi*.

Click to view the code
set.seed(1234)
wm_idw_2017 <- drug_use_2017_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2017 <- wm_idw_2017 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)
write_rds(HCSA_2017, "data/rds/HCSA_2017.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi    var_gi std_dev  p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>     <dbl>   <dbl>    <dbl> <dbl>        <dbl>    <dbl>
 1   2.54  High    0.0283 0.0000123  0.0497  9.60e-1  0.92         0.46    0.860
 2   4.42  Low     0.0117 0.0000551  5.89    3.80e-9  0.02         0.01    2.72 
 3   2.72  Low     0.0121 0.0000499  2.96    3.12e-3  0.08         0.04    1.95 
 4   2.24  Low     0.0120 0.0000452  2.18    2.89e-2  0.14         0.07    2.51 
 5  -1.04  Low     0.0111 0.0000184 -0.943   3.46e-1  0.26         0.13    1.24 
 6  -1.06  Low     0.0100 0.0000295 -0.878   3.80e-1  0.24         0.12    2.31 
 7  -1.25  Low     0.0119 0.0000205 -1.24    2.17e-1  0.08         0.04    1.38 
 8  -1.30  Low     0.0123 0.0000530 -1.10    2.72e-1  0.08         0.04    1.45 
 9  -1.26  Low     0.0109 0.0000409 -1.12    2.61e-1  0.04         0.02    2.25 
10  -0.720 Low     0.0110 0.0000356 -0.475   6.35e-1  0.82         0.41    1.49 
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>
Note
Click to view the code
set.seed(1234)
wm_idw_2018 <- drug_use_2018_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2018 <- wm_idw_2018 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

write_rds(HCSA_2018, "data/rds/HCSA_2018.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi    var_gi std_dev  p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>     <dbl>   <dbl>    <dbl> <dbl>        <dbl>    <dbl>
 1  2.71   High    0.0257 0.0000119   0.532  5.95e-1  0.5          0.25    0.926
 2  4.30   High    0.0162 0.0000557   4.43   9.54e-6  0.04         0.02    2.62 
 3  3.01   Low     0.0130 0.0000261   3.80   1.43e-4  0.04         0.02    1.60 
 4  2.18   Low     0.0125 0.0000253   2.43   1.50e-2  0.06         0.03    2.94 
 5 -0.552  Low     0.0131 0.0000273  -0.543  5.87e-1  0.78         0.39    1.15 
 6 -0.833  Low     0.0119 0.0000393  -0.692  4.89e-1  0.56         0.28    1.39 
 7 -0.373  Low     0.0134 0.0000231  -0.442  6.58e-1  0.74         0.37    1.49 
 8 -1.15   Low     0.0117 0.0000268  -1.04   2.98e-1  0.16         0.08    2.05 
 9 -1.28   Low     0.0102 0.0000363  -0.898  3.69e-1  0.12         0.06    3.49 
10  0.0122 Low     0.0113 0.0000255   0.341  7.33e-1  0.54         0.27    1.54 
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>
Click to view the code
set.seed(1234)
wm_idw_2019 <- drug_use_2019_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2019 <- wm_idw_2019 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

write_rds(HCSA_2019, "data/rds/HCSA_2019.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi    var_gi std_dev  p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>     <dbl>   <dbl>    <dbl> <dbl>        <dbl>    <dbl>
 1  2.30   High    0.0223 0.0000126   0.467  6.41e-1  0.58         0.29    0.698
 2  3.81   High    0.0175 0.0000343   4.12   3.83e-5  0.04         0.02    1.68 
 3  2.63   Low     0.0126 0.0000300   2.82   4.74e-3  0.06         0.03    1.54 
 4  1.83   High    0.0135 0.0000187   1.91   5.55e-2  0.1          0.05    0.781
 5 -0.504  High    0.0126 0.0000200  -0.405  6.86e-1  0.78         0.39    1.39 
 6 -0.757  Low     0.0110 0.0000266  -0.464  6.43e-1  0.82         0.41    1.37 
 7  0.0342 Low     0.0137 0.0000269  -0.115  9.09e-1  0.8          0.4     2.22 
 8 -1.22   Low     0.0110 0.0000187  -1.02   3.06e-1  0.28         0.14    0.945
 9 -1.24   Low     0.0124 0.0000353  -1.10   2.71e-1  0.12         0.06    1.14 
10  0.650  Low     0.0120 0.0000242   0.881  3.78e-1  0.32         0.16    1.50 
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>
Click to view the code
set.seed(1234)
wm_idw_2020 <- drug_use_2020_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2020 <- wm_idw_2020 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

write_rds(HCSA_2020, "data/rds/HCSA_2020.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi     var_gi std_dev p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>      <dbl>   <dbl>   <dbl> <dbl>        <dbl>    <dbl>
 1   1.34  High    0.0201 0.00000951  -0.380 7.04e-1  0.84         0.42    0.650
 2   2.98  Low     0.0114 0.0000251    4.43  9.23e-6  0.02         0.01    2.07 
 3   1.71  Low     0.0115 0.0000193    2.39  1.68e-2  0.08         0.04    0.976
 4   1.21  Low     0.0125 0.0000212    1.26  2.07e-1  0.22         0.11    1.58 
 5  -1.08  Low     0.0126 0.0000185   -0.951 3.42e-1  0.36         0.18    0.839
 6  -1.09  Low     0.0106 0.0000166   -0.818 4.13e-1  0.44         0.22    1.21 
 7  -1.28  Low     0.0138 0.0000226   -1.19  2.32e-1  0.1          0.05    1.05 
 8  -1.39  Low     0.0111 0.0000238   -0.979 3.27e-1  0.26         0.13    0.874
 9  -1.36  Low     0.0119 0.0000253   -1.21  2.24e-1  0.08         0.04    1.30 
10  -0.713 Low     0.0122 0.0000220   -0.571 5.68e-1  0.7          0.35    0.954
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>
Click to view the code
set.seed(1234)
wm_idw_2021 <- drug_use_2021_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2021 <- wm_idw_2021 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

write_rds(HCSA_2021, "data/rds/HCSA_2021.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi     var_gi std_dev p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>      <dbl>   <dbl>   <dbl> <dbl>        <dbl>    <dbl>
 1   0.289 High    0.0161 0.00000782  -0.734  0.463   0.48         0.24    0.405
 2   1.74  Low     0.0119 0.0000240    2.35   0.0185  0.06         0.03    1.44 
 3   0.538 High    0.0138 0.0000199    0.375  0.707   0.68         0.34    0.792
 4   0.482 Low     0.0125 0.0000134    0.633  0.527   0.5          0.25    0.708
 5  -1.68  Low     0.0122 0.0000133   -1.42   0.156   0.12         0.06    0.709
 6  -1.68  Low     0.0115 0.0000186   -1.44   0.149   0.14         0.07    0.721
 7  -0.904 Low     0.0123 0.0000102   -0.710  0.478   0.5          0.25    0.121
 8  -1.99  Low     0.0116 0.0000185   -1.58   0.114   0.04         0.02    0.611
 9  -2.00  Low     0.0111 0.0000187   -1.68   0.0924  0.02         0.01    0.834
10  -0.514 Low     0.0115 0.0000137   -0.184  0.854   0.94         0.47    1.04 
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>
Click to view the code
set.seed(1234)
wm_idw_2022 <- drug_use_2022_sf %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

HCSA_2022 <- wm_idw_2022 %>% 
  mutate(local_Gi = local_gstar_perm(
    no_cases, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)

write_rds(HCSA_2022, "data/rds/HCSA_2022.rds")
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
# A tibble: 77 × 17
   gi_star cluster   e_gi     var_gi std_dev p_value p_sim p_folded_sim skewness
     <dbl> <fct>    <dbl>      <dbl>   <dbl>   <dbl> <dbl>        <dbl>    <dbl>
 1 -0.308  High    0.0137 0.00000895  -0.558   0.577  0.54         0.27    0.192
 2  0.647  High    0.0135 0.0000177    0.650   0.515  0.54         0.27    0.626
 3  0.0640 Low     0.0121 0.0000122    0.325   0.745  0.82         0.41    0.210
 4 -0.575  Low     0.0130 0.00000827  -0.650   0.515  0.52         0.26    0.705
 5 -1.21   Low     0.0126 0.00000814  -1.14    0.254  0.28         0.14    0.357
 6 -1.40   Low     0.0116 0.0000117   -1.17    0.244  0.2          0.1     0.977
 7 -0.834  Low     0.0125 0.00000785  -0.677   0.499  0.6          0.3     0.221
 8 -1.75   Low     0.0116 0.00000943  -1.54    0.122  0.1          0.05    0.309
 9 -1.77   Low     0.0116 0.0000127   -1.55    0.122  0.1          0.05    0.203
10 -0.327  Low     0.0124 0.0000132   -0.156   0.876  0.98         0.49    0.455
# ℹ 67 more rows
# ℹ 8 more variables: kurtosis <dbl>, nb <nb>, wts <list>, province <chr>,
#   fiscal_year <dbl>, types_of_drug_offenses <chr>, no_cases <dbl>,
#   geometry <MULTIPOLYGON [°]>

5.2 Visualising Local HCSA

To visualise local HCSA, we can plot the gi* value and the p-value of each year side by side.

The chunk of code below uses tmap() to plot and visualize the spatial distribution of the Gi* statistic for the different years. For this section we have the maps of Gi* and the p-value arranged side by side for easier comparisons.

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2017) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Gi* of 2017",
            main.title.size = 0.8)
map2 <- tm_shape(HCSA_2017) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2017",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2018) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8))+
  tm_layout(main.title = "Gi* of 2018",
            main.title.size = 0.8)

map2 <- tm_shape(HCSA_2018) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2018",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2019) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8))+
  tm_layout(main.title = "Gi* of 2019",
            main.title.size = 0.8)

map2 <- tm_shape(HCSA_2019) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2019",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2020) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8))+
  tm_layout(main.title = "Gi* of 2020",
            main.title.size = 0.8)

map2 <- tm_shape(HCSA_2020) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2020",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2021) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8))+
  tm_layout(main.title = "Gi* of 2021",
            main.title.size = 0.8)

map2 <- tm_shape(HCSA_2021) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2021",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")
map1 <- tm_shape(HCSA_2022) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8))+
  tm_layout(main.title = "Gi* of 2022",
            main.title.size = 0.8)

map2 <- tm_shape(HCSA_2022) +
  tm_fill("p_value",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi* of 2022",
            main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)

5.3 Visualising Hot Spot & Cold Spot Areas

Here, we will plot the significant (i.e. p-values less than 0.05) hot spot and cold spot areas by using the appropriate tmap functions.

Click to view the code
tmap_mode("plot")

HCSA_2017_sig <- HCSA_2017  %>%
  filter(p_sim < 0.05)
HCSA_2018_sig <- HCSA_2018  %>%
  filter(p_sim < 0.05)

map1<- tm_shape(HCSA_2017) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2017_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2017",
            main.title.size = 0.8)
map2<- tm_shape(HCSA_2018) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2018_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2018",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")

HCSA_2019_sig <- HCSA_2019  %>%
  filter(p_sim < 0.05)
HCSA_2020_sig <- HCSA_2020  %>%
  filter(p_sim < 0.05)

map1<- tm_shape(HCSA_2019) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2019_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2019",
            main.title.size = 0.8)
map2<- tm_shape(HCSA_2020) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2020_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2020",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)

Click to view the code
tmap_mode("plot")

HCSA_2021_sig <- HCSA_2021  %>%
  filter(p_sim < 0.05)
HCSA_2022_sig <- HCSA_2022  %>%
  filter(p_sim < 0.05)

map1<- tm_shape(HCSA_2021) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2021_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2021",
            main.title.size = 0.8)
map2<- tm_shape(HCSA_2022) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(HCSA_2022_sig) +
  tm_fill("gi_star") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "HCSA of 2022",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)

6. Conclusion

To conclude,